Robust Trajectory Tracking Control for Continuous-Time Nonlinear Systems with State Constraints and Uncertain Disturbances

In this paper, a robust trajectory tracking control method with state constraints and uncertain disturbances on the ground of adaptive dynamic programming (ADP) is proposed for nonlinear systems. Firstly, the augmented system consists of the tracking error and the reference trajectory, and the tracking control problems with uncertain disturbances is described as the problem of robust control adjustment. In addition, considering the nominal system of the augmented system, the guaranteed cost tracking control problem is transformed into the optimal control problem by using the discount coefficient in the nominal system. A new safe Hamilton–Jacobi–Bellman (HJB) equation is proposed by combining the cost function with the control barrier function (CBF), so that the behavior of violating the safety regulations for the system states will be punished. In order to solve the new safe HJB equation, a critic neural network (NN) is used to approximate the solution of the safe HJB equation. According to the Lyapunov stability theory, in the case of state constraints and uncertain disturbances, the system states and the parameters of the critic neural network are guaranteed to be uniformly ultimately bounded (UUB). At the end of this paper, the feasibility of the proposed method is verified by a simulation example.


Introduction
With the continuous application of automatic driving technology [1,2] and intelligent robot technology [3,4], the role of the safety-critical system has attracted extensive attention. In the process of designing a controller, safety is the primary consideration compared to other performances. For the control system with strict safety requirements, the CBF was applied to the control system as a tool to achieve the purpose of state constraints.
Reinforcement learning (RL) can be regarded as the technology of strategy learning and evaluation learning. In the actual engineering application, although the phenomenon of the dimension curse exists in dynamic programming, RL can deal with it well, and we also call it adaptive dynamic programming (ADP) [5][6][7]. Adaptive dynamic programming is an intelligent control method, and it is also an approximate tool to deal with optimal control problems. However, the analytical solution of the Hamilton-Jacobi-Bellman (HJB) equation is generally difficult to obtain; therefore, the adaptive dynamic programming (ADP) can learn the solution of the HJB equation online by the neural network (NN) approximation method [8][9][10]. At present, a variety of control methods based on ADP have been proposed by researchers to deal with the problem of trajectory tracking and optimal control [11][12][13][14][15][16].
Adaptive dynamic programming enables the complex nonlinear system to achieve the desired tracking control goal [17][18][19][20]. In reference [17], the tracking performance of continuous-time nonlinear systems was analyzed by considering the influence of input constraints. Due to the influence of the actual situation, a series of uncertain disturbance factors are often considered. Therefore, robust optimal tracking control has become a research hot spot. In reference [18], solving tracking problems for complex nonlinear systems with uncertainty can be more difficult, and the adaptive criticism technique was used to solve the robust tracking control problems of nonlinear systems with random disturbances. Considering the nonlinear system with a continuous-time matching uncertainty, an effective robust tracking control method was adopted, and the discounted coefficient was selected for the nominal augmented error system in references [19,20]. Considering the system with disturbances, H ∞ tracking control was used in control systems with disturbances [21]. In order to reduce the design cost and waste of resources and adjust the accuracy of the control system, a tracking control method based on the event triggering was proposed [22]. Considering the optimal regulation problem, a new non-quadratic discount performance function was proposed in reference [23]. In reference [24], an improved adaptive robust tracking method was proposed for the uncertainty of nonlinear systems and successfully extended to the mass-spring-damper system. The tracking control method proposed above enabled the feasibility of the control strategy and enabled the system to achieve the predetermined control target. However, none of the tracking control methods proposed above consider the state constraints problem.
In references [25][26][27][28], different ADP-based methods were proposed to solve various engineering problems. In some specific environments, the control system is often required to have reliable security. The purpose for which the safety system was designed is to find its control strategy by conforming to the safety specifications specified by the physical constraints of the system [29]. The use of the CBF method to solve the safety constraints of systems with strict requirements has attracted extensive attention [30][31][32][33]. Let the states displayed by the system converge to the desired equilibrium point; an approximate adjustment method for solving the optimization problem of safety boundary control was proposed, and the cost of violating the safety constraint was directly embedded into the value function [34]. In reference [35], the application of the CBF was introduced, and the verification method and the characteristics of implementation safety in the context of a safety-critical control system are summarized. The discrete-time state constraints problem was described in reference [36], and the HJB equation with the CBF was solved by using the approximate properties of the neural network.
In this paper, a new guaranteed cost robust tracking method with state constraints and uncertain disturbances is proposed. This method can guarantee the convergence of the system error under conditions of uncertain disturbances and state constraints. The discounted coefficient is selected for the nominal augmented system with tracking errors. In addition, the CBF is added to the system to solve the constraint problem of system states. Finally, the approximation property of the critic NN is adopted to deal with the HJB equation. The contributions of this paper are described below:

1.
For robust tracking control problems, the CBF is applied to the tracking control system with uncertain disturbances so that the system can still have good tracking performance in the case of state constraints; 2.
Combining the traditional adaptive control method with the CBF, the CBF is directly extended to the original system, and the CBF is used as a penalty function to punish unsafe behavior; 3.
A new guaranteed cost robust adaptive tracking method with state constraints and uncertain disturbances is proposed to solve the safety HJB equation through the critic NN learning framework, and the critic NN parameters are guaranteed to be uniformly ultimately bounded (UUB) under the influence of state constraints and uncertain disturbances.
The arrangement of other parts of this article is described below: Section 2 states the preliminary knowledge and introduces the relevant contents of the control barrier function. Section 3 describes the selection of discount value functions for the nominal augmented system and introduces the form of the new cost function after adding the barrier function. Section 4 introduces the learning method of a critic neural network with state constraints and uncertain disturbances. In Section 5, the effectiveness of the proposed method is verified by a simulation example. Finally, some conclusions are summarized in Section 6.

Problem Statement
Consider the following uncertain nonlinear safety systeṁ where x(t) ∈ Φ ⊂ R n is the state variable, u(t) ∈ U ⊂ R m is the control vector, Φ represents the safe feasible states set, U represents all admissible input sets, f (x(t)) ∈ R n and g(x(t)) ∈ R m are known functions with f (0) = 0, and ∆ f (x(t)) ∈ R n is the unknown perturbation term with ∆ f (0) = 0. Here, let the initial state x(0) = x 0 ; we assume that there exists a constant g M and it satisfies 0 < g(x) ≤ g M for ∀x ∈ R n , and ∆ f (

Assumption 1.
Let the reference trajectory of the system (1) be x d (t), and x d (t) is a bounded function, which is limited and generated by the command generatorẋ d (t) = r(x d (t)). Meanwhile, the reference trajectory x d ∈ R n and the command function r ∈ R n are all Lipschitz continuous.

Control Barrier Function
The application of the CBF further solves the constraint problem of the system [36]. In a predefined security set, the CBF candidate is always positive and tends to infinity at the defined set boundary. The CBF has a negative derivative at infinity, so the CBF will not reach infinity. If the state of the system is close to the safety boundary, then the condition that the derivative is negative will return the state to the safety set, so that the state displayed by the system will be maintained within the predetermined set. The safe feasible set Φ consists of operational constraints and safety specifications [34], where ∂Φ represents the boundary of the safe feasible set Φ, IntΦ represents the interior of the set Φ, and h is a continuously differentiable function of x, which is composed of a one-dimensional system constraint range. The CBF candidate B(x) satisfies all of the following properties, where α 1 (·), α 2 (·), and α 3 (·) are Lipschitz class K functions, and B(x) is a control barrier function.

Assumption 2.
Under the condition of uncertain disturbances, to ensure that the states of the system are constrained . We use a logarithmic control barrier function B r (x), which satisfies the following properties, Besides, B r (x) is monotonically decreasing for ∀ x ∈ Φ.
Under the condition of satisfying Assumption 2, the expression of the specific logarithmic barrier function can be defined as ). (8) In (8), the parameter γ is a constant, and γ also determines the speed at which B r (x) is limited as it approaches the safety barrier.
Before describing the modified robust tracking method with constraints, we first make the following definitions and assumptions. Definition 1. The safety control input set of the nonlinear system (1) is given below where x u is the system state associated with the control strategy u, and intΦ is the interior of the set defined in (4).

Assumption 3.
The initial condition of the nonlinear system (1) is strictly within Φ; in other words, x 0 ∈ intΦ. Assume that the initial set of allowed inputs is not empty and satisfies U a = U ∩ U c . In addition, the control strategy u(x 0 ) ∈ U a exists.

Modified Robust Adaptive Tracking Control
The augmented system is constructed by combining the tracking error and the reference trajectory. Before describing the modified robust adaptive tracking control, the tracking error is written as e x (t) = x(t) − x d (t). According to (1), the tracking error system is derived aṡ where r(x d (t)) is a Lipschitz continuous function. By considering the tracking error dynamics (10), the infinite horizon cost function is given below [37] where α > 0 is a discount factor, and U(e x , u) = e T x Qe x + u T Ru, both Q ∈ R n×n and R ∈ R m×m are symmetric positive definite matrices.
Under the condition of state constraints and uncertain disturbances, the purpose of dealing with the guaranteed cost tracking problem is to find the control input u = u(e x (t), x d (t)) and a positive real numberV * ; then the tracking error e x (t) converges to zero. Meanwhile, the cost function described in (11) satisfiesV <V * . It should be pointed out thatV * is called a guaranteed cost function, and the control u is called a guaranteed cost control input.

Remark 1.
The discount term e −α(τ−t) given in (11) is mainly used to ensure that the cost function isV < ∞ since the control u(e x (t), x d (t)) contains a part depending on the reference trajectory x d (t). In the absence of the discount term, u(e x (t), x d (t)) may make (11) to be unbounded. If the reference trajectory x d (t)) does not converge to zero, the cost function (11) is unbounded without considering the discount term e −α(τ−t) .
, and the augmented system of error dynamics can be givenṡ (t) = F(s(t)) + G(s(t))u(t) + ∆F(s(t)), the specific forms of F(s(t)) and G(s(t)) can be expressed as and ∆F(s(t)) = G(s(t))d(s(t)), and because we know that d(x) ≤ d M (x), it is very easy to know that the uncertain disturbance term d(s(t)) ≤ d M (s(t)) holds, and d M (s(t)) is the boundary of the uncertain disturbance term d(s(t)).

Remark 2.
There is a random disturbance term d(s(t)) in the augmented system (12) described above, which makes the process of designing the controller difficult. In the following introduction, the augmented error system (12) is equivalent to the optimal control of its nominal system, and the tracking problem with the random disturbance is transformed into an optimization adjustment problem with the discounted value function.
Considering the existence of the uncertain term d(s(t)), the nominal system description of the system (12) Inspired by references [29,36], B r (x) is combined with the nominal augmented system (13), and the modified value function is (14) where Q T = diag{Q, 0 n×n }, ρ = λ max (R), the maximum eigenvalue of R can be expressed by λ max (R), both Q ∈ R n×n and R ∈ R m×m are weighted symmetric positive definite matrices of augmented systems, and α > 0 is a discount coefficient. According to Bellman's principle of the optimal control theory [38], the minimum value of the Hamiltonian of the modified value function (14) of the nominal system (13) is given where V s = ∂V/∂s, and the cost function V * (s(t)) can be considered as For the system (13) with the control barrier value function (14), since the equation ∂H(s, u * , V * s )/∂u * = 0 holds, we can obtain the optimal control input u * from (15) where V * s = ∂V * (s)/∂s, and V * (s) denotes the optimal value V(s).

State Constraints Analysis
In the process of designing a robust tracking controller, the CBF as a constraint tool makes the states of the system evolve within the specified constraints, and the system can maintain good performance within the set safety constraints. The CBF provides a constraint tool for safety-critical systems to optimize the performance of other control objectives and clearly explains the priority of security compared to other performance indexes. In order to further describe that the CBF is bounded, it is described below that the boundedness of the CBF is demonstrated by changing the order of the controller.

Lemma 1.
Consider an admissible feedback control strategy u 1 ∈ U a ; there is the following time-invariant positive definite function Z, which satisfies Z ∈ N 1 where V is the value function of the system for all t ∈ [0, ∞), and the following formula holds Proof. Assume V(s, u 1 ) > 0 exists and is continuously differentiable; then, we have Considering (21), there are also where P(s, u) = ρd 2 M (s) + s T Q T s + u T Ru − αV(s) + B r (s). We can derive from (21) and (22) Combining (18), (21), and (23), we can obtain Therefore, we can obtain Z(s(t), u 1 ) = V(s(t), u 1 ).
This completes the proof.

Lemma 2.
We consider a series of positive definite value functions V(s, t, u 1 ), V(s, t, u 2 ),. . ., and V(s, t, u i ), and the corresponding abbreviations are V 1 , V 2 , . . ., and V k , which are concerned with the allowable control inputs u 1 (s, t), u 2 (s, t), . . ., and u k (s, t) ∈ U a . Then, the Hamiltonian value defined in (15) satisfies the following conditions and the CBF candidate B k r is bounded in the range of 1 < k < i .

Proof.
Assume that 0 ≤ k ≤ j ≤ i is satisfied for any j and k, and the condition H min k ≤ H min j holds; therefore, one has where V o = V o (s(t), u k ). According to (17), u * can be rewritten as Considering T(s) = ρd 2 M (s) + s T Q T s + B r (s) − αV(s). According to (27), one may obtain According to the above description, since Because lim t→∞ V o (s(t)) = 0, the following results are obtained From the above Lemmas 1 and 2, we can obtain In the above derivation, not only Z(s(t), u k ) is bounded, but also P(s, u) is positive definite, and then B k r is also bounded. In other words, in the case of state constraints, the system states will not reach the safety boundary in the process of tracking the reference trajectory. This proves that the CBF is bounded within each moment. (16), let both Assumption 2 and Assumption 3 hold. Through the improvement of control input (17), the security of the tracking state is guaranteed within a certain range for all t > 0.

Theorem 1. For the performance optimization problem described in
Proof. Through the introduction to Lemmas 1 and 2 above, the performance functions Z(s, u k ) and candidate function B k r are bounded at each moment after the control input (17) is changed. From Assumptions 1 and 2, at the boundary of the constraint range, the value of the barrier function B k r will reach infinity; in other words, the CBF remains is bounded at any moment, which ensures that the states of the system never reach the safe boundary.
In the above introduction, the CBF is directly added to the cost function, which makes the states of the system constrained. This method is applicable to the guaranteed cost robust trajectory tracking control without initial admissible control. The traditional tracking controller usually needs the initial admissible control law. Although the appropriate initial admissible control law is found, the appropriate initial admissible control law may not satisfy the condition of state constraints.
Due to the existence of the discount term e −α(τ−t) in Equation (11), to guarantee the stability of the closed-loop system in the process of the tracking reference trajectory process, a guaranteed cost adaptive critic NN learning framework is designed. Before proceeding to the next step, we make the following assumption.

Design of Guaranteed Cost Adaptive Critic NN Learning Framework
In this section, the approximation property of the critic NN is used to approximate the solution of the safety HJB Equation (15), a guaranteed cost adaptive critic NN learning framework is proposed, the weight of the critic NN is updated through online the learning scheme, and all the vectors of the critic NN finally are guaranteed to be UUB. Considering the cost function described in (16), we design a critic NN to approximate the cost function V * (s(t)) and its partial derivative ∇V * (s) = (∇φ(s)) T W + ∇ v (s), (30) where W ∈ R l is the ideal vector of the critic neural network, the activation function of the critic NN can be expressed as φ(s) = [ϕ 1 ϕ 2 ϕ 3 · · · ϕ l ] T ∈ R l , l is the number of hidden-layer neurons, ∇φ(s) is denoted as the derivative of φ(s), the approximation error of the critic NN is denoted by v (s), and ∇ v (s) is the derivative of v (s).
From Equations (15), (16), and (30), the approximate error of the safety HJB form is where v1 (s) = ∇ T v (s)(F(s) + G(s)u * ). Considering Equations (17) and (30), we can draw the following conclusion, At the same time, we substitute (32) into (31) and can obtain We do not know the value of the ideal weight W; therefore, by using the critic NN to approximate the cost function V * (s) aŝ whereŴ denotes the estimated value of the ideal vector W, andV is the estimated value of the ideal cost function V * . We can obtain the approximate HJB equation form from Equations (15) and (34) Based on Equation (34), the control inputû(s) can be approximated bŷ Through Equations (31) and (35), we define the HJB equation error caused by the critic NN in the approximation process as The estimation error of the weights of the critic NN is defined asW, and we can obtaiñ The HJB approximation error can be defined as where ξ = ∇φ(F(s) + G(s)û) − αφ(s). The Lyapunov function candidate J 1 (s) is shown in Assumption 4, and we take Π(s,û) as an indicator function and define it as We chooseŴ to minimize the square residual E = (1/2)ε T ε, and then we obtain the minimum value of the HJB approximation error ε. We use the gradient descent method as the critic vector adjustment optimization laẇŴ , and A(s) = (1/4)∇φ (∇φ) TŴ (θ/(1 + ξ T ξ)) T , and β > 0 is a learning rate that determines the convergence speed of the critic NN. K 1 and K 2 are two tuning parameters. From the above description, it is deduced that the weight estimation error iṡW Theorem 2. Consider the nominal system (13), the modified value function (15), and the tuning laws (41). Only if all the above Assumptions 1-5 hold, then the critic NN errorW, the system state x, and the control input u * are guaranteed to be UUB.
Proof. Analyze the Lyapunov candidate function described below The result of deriving Equation (43) is shown aṡ The first termL V iṡ where v1 (s) = ∇ T v1 (s)(F(s) − 1 2 G(s)R −1 G(s) T ∇φ(s)W), σ = ∇φ(s)(F(s) + G(s)u), and The second termL W can be obtained by (41) . Further, we can obtaiṅ Taking the sum of the termsL V andL W , we obtaiṅ Assume that Z = [W T θ,W T ] T , then we can obtaiṅ Define Let the tuning parameters K 1 , K 2 , and γ be chosen so that M > 0, and we obtaiṅ where µ = b + d. In summary, the Lyapunov derivativeL(t) is negative if Based on the Lyapunov theorem [39], as long as the selected appropriately tuning parameters K 1 , K 2 , and γ make the formula (32) hold, in the case of state constraints and uncertain disturbances, the critic NN weight errorW, the system state x, and the control input u * are guaranteed to be UUB, and the nonlinear system (1) is guaranteed to be closed-loop stable in the presence of state constraints and uncertain disturbances. The proof is completed.

Simulation
We consider a spring-mass-damping system with nonlinear properties [22], and the system dynamics of the spring-mass-damper are as follows [24] where x = [x 1 , x 2 ] T ∈ R 2 and the nonlinear condition K(x) = x 3 , x 1 , and x 2 are the position and velocity, respectively, and u is the force applied to the object. M is the mass of the object. K is the stiffness constant of the spring, and C is the damping. The above system dynamics parameters are M =1 kg and C = 0.5 N· s/m. A mismatched disturbance may lead to system instability. Considering that the system still has stable performance under disturbances, the uncertain disturbance matching the system is selected, the uncertain disturbance term d(x) = px 1 sin(x 2 ), and we assume that p ∈ [−1, 1] and d M (x) = x . In the simulation process, considering that the initial allowable control law is not required, to make the tracking errors of the system converge to zero, the reference trajectory gradually tending to zero is selected, and the following reference trajectory x d (t) is giveṅ the initial condition is given as x d (0) = [0.15, 0.25] T , and we set the augmented state vector as s = [e T x , x T d ] T , and then combine (56) with (57), the dynamics of the augmented system can be deriveḋ where s = [s 1 , s 2 , s 3 , To constrain the states of the system in augmented system dynamics (58), the control barrier function used is as follows γh(s 1 +s 3 )+1 ) B r2 (s 2 + s 4 ) = −log( γh(s 2 +s 4 ) γh(s 2 +s 4 )+1 ).
The state constraints of the system are given as −0.2 ≤ x 1 ≤ 0.35 and −0.15 ≤ x 2 ≤ 0.4, and the parameter γ = 0.02.
Besides, we select the learning rate as β = 1.5 and the discount factor α = 0.15, respectively. In order to deal with the approximate optimal control for the nominal augmented part of (58), we choose Q T = diag{5I 2 , 0 2×2 } and R = I, and I denotes an identity matrix of appropriate dimensions. In this example, the activation function for the critic NN is chosen as φ(s) = [s 2 1 , s 1 s 2 , s 1 s 3 , s 1 s 4 , s 2 2 , s 2 s 3 , s 2 s 4 , s 2 3 , s 3 s 4 , s 2 4 ] T . In addition, the weights of the critic NN are denoted asŴ c = [W c1 , W c2 , . . . , W c10 ] T . The initial value of the state is given as x(0) = [−0.2, 0.4] T , and it is easy to calculate the initial error vector according to s(0) = x(0) − x d (0), so the initial state of the augmented system is s(0) = [−0.35, 0.15, 0.15, 0.25] T . In order to satisfy the condition of persistency of excitation, an exploration noise exp(−0.25t)sin 2 (t)cos(t) is added during the training of the neural network.
The convergence of critic parameters is shown in Figure 1, and the critic parameters after 30 s converge toŴ = [3.3767, 0.9606, 0.8867, 0.7752, 1.9266, 1.0686, 1.105, 1.067, 1.0992, 1.0898] T . Figure 2 shows the control inputs of the system. Figure 3 shows the trajectory of the tracking errors e x1 and e x2 of the system without state constraints. Figure 4 shows the tracking error of the system under state constraints. Figures 5 and 6 show that the system tracks the reference trajectory without state constraints, and we can see that the system states violate the constraints. Figures 7 and 8 show that the system tracks the desired trajectory with state constraints, and that under the condition of state constraints and uncertain disturbances, the system still maintains good performance. The method described in this paper can ensure the stability of the closed-loop system. In summary, the simulation results display the effectiveness of the proposed method.

Conclusions
This paper presented a robust trajectory tracking method for nonlinear systems with state constraints and uncertain disturbances based on adaptive dynamic programming. Firstly, the system error was combined with the reference trajectory to construct the augmented system, and at the same time, the nominal system of the augmented system was considered. In order to overcome the uncertain disturbances of the augmented system, the discount coefficient was introduced into the nominal system, and the CBF was added into the nominal system with the discount coefficient to constrain the states of the system. In addition, cost functions and control strategies were learned by designing a guaranteed cost adaptive critic NN learning framework. Finally, the simulation results demonstrated that the described method can converge the system error within the state constraints. In the next work, we will try to extend the state constraints method to discrete-time tracking control systems and multi-agent systems.